298 research outputs found

    A goodness-of-fit test for the functional linear model with scalar response

    Full text link
    In this work, a goodness-of-fit test for the null hypothesis of a functional linear model with scalar response is proposed. The test is based on a generalization to the functional framework of a previous one, designed for the goodness-of-fit of regression models with multivariate covariates using random projections. The test statistic is easy to compute using geometrical and matrix arguments, and simple to calibrate in its distribution by a wild bootstrap on the residuals. The finite sample properties of the test are illustrated by a simulation study for several types of basis and under different alternatives. Finally, the test is applied to two datasets for checking the assumption of the functional linear model and a graphical tool is introduced. Supplementary materials are available online.Comment: Paper: 17 pages, 2 figures, 3 tables. Supplementary material: 8 pages, 6 figures, 10 table

    Efficient Bayesian hierarchical functional data analysis with basis function approximations using Gaussian-Wishart processes

    Full text link
    Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected with measurement errors on discretized grids. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo. Compared to the standard Bayesian inference that suffers serious computational burden and unstableness for analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results as the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids where the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes.Comment: Under revie

    Functional kernel estimators of conditional extreme quantiles

    Get PDF
    We address the estimation of "extreme" conditional quantiles i.e. when their order converges to one as the sample size increases. Conditions on the rate of convergence of their order to one are provided to obtain asymptotically Gaussian distributed kernel estimators. A Weissman-type estimator and kernel estimators of the conditional tail-index are derived, permitting to estimate extreme conditional quantiles of arbitrary order.Comment: arXiv admin note: text overlap with arXiv:1107.226

    Approximating nonequilibrium processes using a collection of surrogate diffusion models

    Full text link
    The surrogate process approximation (SPA) is applied to model the nonequilibrium dynamics of a reaction coordinate (RC) associated with the unfolding and refolding processes of a deca-alanine peptide at 300 K. The RC dynamics, which correspond to the evolution of the end-to-end distance of the polypeptide, are produced by steered molecular dynamics (SMD) simulations and approximated using overdamped diffusion models. We show that the collection of (estimated) SPA models contain structural information "orthogonal" to the RC monitored in this study. Functional data analysis ideas are used to correlate functions associated with the fitted SPA models with the work done on the system in SMD simulations. It is demonstrated that the shape of the nonequilibrium work distributions for the unfolding and refolding processes of deca-alanine can be predicted with functional data analysis ideas using a relatively small number of simulated SMD paths for calibrating the SPA diffusion models.Comment: 13 pages, 7 figure

    Forecasting basketball players’ performance using sparse functional data

    Get PDF
    Statistics and analytic methods are becoming increasingly important in basketball. In particular, predicting players’ performance using past observations is a considerable challenge. The purpose of this study is to forecast the future behavior of basketball players. The available data are sparse functional data, which are very common in sports. So far, however, no forecasting method designed for sparse functional data has been used in sports. A methodology based on two methods to handle sparse and irregular data, together with the analogous method and functional archetypoid analysis is proposed. Results in comparison with traditional methods show that our approach is competitive and additionally provides prediction intervals. The methodology can also be used in other sports when sparse longitudinal data are available

    Adaptive estimation in circular functional linear models

    Get PDF
    We consider the problem of estimating the slope parameter in circular functional linear regression, where scalar responses Y1,...,Yn are modeled in dependence of 1-periodic, second order stationary random functions X1,...,Xn. We consider an orthogonal series estimator of the slope function, by replacing the first m theoretical coefficients of its development in the trigonometric basis by adequate estimators. Wepropose a model selection procedure for m in a set of admissible values, by defining a contrast function minimized by our estimator and a theoretical penalty function; this first step assumes the degree of ill posedness to be known. Then we generalize the procedure to a random set of admissible m's and a random penalty function. The resulting estimator is completely data driven and reaches automatically what is known to be the optimal minimax rate of convergence, in term of a general weighted L2-risk. This means that we provide adaptive estimators of both the slope function and its derivatives

    Lazy Lasso for local regression

    Get PDF
    Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenario

    Model-Based Clustering and Classification of Functional Data

    Full text link
    The problem of complex data analysis is a central topic of modern statistical science and learning systems and is becoming of broader interest with the increasing prevalence of high-dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to acquire knowledge from raw data for exploratory analysis, which can be achieved through clustering techniques or to make predictions of future data via classification (i.e., discriminant analysis) techniques. Latent data models, including mixture model-based approaches are one of the most popular and successful approaches in both the unsupervised context (i.e., clustering) and the supervised one (i.e, classification or discrimination). Although traditionally tools of multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which the individual data units are functions (e.g., curves, surfaces), rather than simple vectors. In many areas of application, the analyzed data are indeed often available in the form of discretized values of functions or curves (e.g., time series, waveforms) and surfaces (e.g., 2d-images, spatio-temporal data). This functional aspect of the data adds additional difficulties compared to the case of a classical multivariate (non-functional) data analysis. We review and present approaches for model-based clustering and classification of functional data. We derive well-established statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these high-dimensional data, including their heterogeneity, missing information, and dynamical hidden structure. The presented models and algorithms are illustrated on real-world functional data analysis problems from several application area
    corecore